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Abstract 

A binary string of length 2^^ induces the Boolean function of k variables whose Shannon 
expansion is the given binary string. This Boolean function then is representablc via a unique 
reduced ordered binary decision diagram (ROBDD). The given binary string is fully recoverable 
from this ROBDD. We exhibit a lossless data compression algorithm in which a binary string 
of length a power of two is compressed via compression of the ROBDD associated to it as 
described above. We show that when binary strings of length n a power of two are compressed 
via this algorithm, the maximal pointwise redundancy /sample with respect to any s-state binary 
information source has the upper bound (4 log2 s + 16 + o(l))/ log2 n. To establish this result, we 
exploit a result of Liaw and Lin stating that the ROBDD representation of a Boolean function 
of k variables contains a number of vertices on the order of (2 + o(l))2'^/A;. 

Index Terms: lossless source coding, universal codes. Boolean functions, ROBDD repre- 
sentations 
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I Introduction 



Let S'(dyadic) denote the set of all binary strings x such that 

• The length of x is a power of two. 

• The substring of x which forms the left half of x does not coincide with the substring of x 
which forms the right half of x. 

• X contains at least one entry of 1 and at least one entry of 0. 

Let X € 5(dyadic) and let k be the logarithm to the base two of the length of x. In a natural 
way, x induces a Boolean function fx of k variables. The function fx maps the set {0, l}'^ into 
the set {0, 1}, and can be defined as follows. Let ui,U2, - ■■ , be the lexicographical ordering 
of all binary strings of length k. For each i = 1, 2, . . . , 2^^, define fx{ui) to be the i-ih coordinate 
of X. Following [1] [2], the Boolean function fx can be represented by a directed acyclic graph 
called a reduced ordered binary decision diagram (ROBDD). Since the Boolean function fx can 
be recovered from its ROBDD representation, x can also be recovered from this representation. 
This means that we can losslessly compress a string x G S" (dyadic) by compressing the ROBDD 
representation of the Boolean function fx induced by x. It is the purpose of this note to 
investigate the compression performance that is achievable by such a compression algorithm (we 
obtain a redundancy bound) . 

There are public domain software packages (sonic of them on the Internet) for computing 
the ROBDD representation of a Boolean function. Such a package would be easily adaptable 
in order to provide compression of a data string in ^(dyadic) according to the ROBDD-based 
compression algorithm that we shall present. 

II ROBDD 's Representing Data Strings 

Define G to be the set of all finite graphs G such that 

(a) G is directed and acyclic. 

(b) G has a unique nonterminal vertex Vq such that for every other vertex V of G, there is at 

least one directed path leading from Vq to V. The vertex Vq is called the root vertex of 
G. 

(c) There are exactly two terminal vertices of G, which shall be denoted Tq and Tq, respectively. 

(d) From each nonterminal vertex of G, there emanate exactly two edges, one of which is 

labelled "0" and the other of which is labelled "1". These edges terminate at different 
vertices (i.e., G has no multiple edges). 

(e) Each vertex V of G carries a positive integer label L{V) which we shall call the level of V. 

The levels of the vertices satisfy the properties: 

• L{VS) = 1. 
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. L(rO) = L(r^). 

• U Vi,V2, . . . ,Vj are the vertices visited in order along any directed path in G, then 
L{Vi) < L{V2) < ... < L{Vj). (Note: The labels L{Vi) , L{V2) , . . . , L{V-i) are not 
necessarily consecutive integers.) 

Example 1. The graph given in Figure 1, in which each vertex is labelled by its level, is seen 
to satisfy the properties (a)-(e). Therefore, this graph is a member of Q. 

Let G € Q. Let V{G) be the set of vertices of G. Let {0, l}"*" denote the set of all binary 
strings of finite positive length. We define 0g to be the unique mapping from V{G) into {0, l}"*" 
such that 

• (/.G(rO) = and <I,g{T^) = 1- 

• If y is a nonterminal vertex of G, if the edge labelled emanating from V terminates at 
vertex Vq, and the edge labelled 1 emanating from V terminates at vertex Vi, then 



(Notation: If y is a binary string and j is a positive integer, then denotes the binary string 
obtained by concatenating together j copies of y. If yi and 2/2 sue binary strings, then yiy2 
denotes the binary string obtained by concatenating y2 onto the right end of yi.) 

Example 2. Let ^1,^2; ■ ■ ■ j^ie denote the sixteen vertices of the graph G in Figure 1, 
as indicated in Figure 2. (This is a "canonical ordering" of the vertices of G, which shall be 
explained later.) Since = Tq and Ai^ = Tq, we have 



(f>G{A8 
0g(^16 

(pciAio 

0g(^11 
0g(^12 
0g(^13 
0g(^14 
•^0(^15 
<^g(^4 

4>g{A5 

</'g(^6 
</'g(^7 
^g{A2 
(f>G{A3 
(f>G{Al 



= 

= 1 

= (t>G{A8)MAw) = 01 

= <^g(^8)'</'g(^16)' = 0011 

= </.g(^9)'/'g(^16)' = 0111 

= (/.g(^8)Vg(^16)^ = 00001111 

= MAgfMAie)^ = 01011111 

= <^g(^io)0g(^i6)^ = 00111111 

= MAiiHaiAw)'' = 01111111 

= (f>G{Asf(l)G{A9)'^ = 0000000001010101 

= (l>G{Aiof(t)G{Aiif = 0011001101110111 

= (/>g(^i2)0g(^i3) = 0000111101011111 

= (t>GiAu)(t)GiAi5) = 0011111101111111 

= (pG{A4)(t>G(.A5) = length 32 string 

= 4>G{Ae)(pG(.A7) = length 32 string 

= <Pg{A2)4'g{A3) = length 64 string 



The following is clear from the definition of <pG and Example 2. 
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Lemma 1 Let G be any graph in Q. Suppose L{Tq) = L{Tq) = k + 1. Then, for each vertex V 
ofG, the length of (I)g{V) is 2*^+i-'^(^). In particular, the length of cpaiV^) is 2^. 

Definition. We define Q* to be the set of all graphs G £ Q such that the mapping (pa is 
one-to-one. 

Lemma 2 The following statements hold: 

(a) For any G € Q* , the binary string ^g(Vq) is a member of S (dyadic) . 

(b) For each x € S'(dyadic), there is a unique G £ Q* such that (/)g(V2) = x. In the language of 

[1] [2], this unique graph G is the unique ROBDD representing the Boolean function fx- 

Proof. Part (a) is clear from Example 2. Part (b) (including the uniqueness of the ROBDD 
representation) may be seen to be true by consulting the papers [1] [2]. 

Notation. For each x G S* (dyadic), we let Gx denote the unique graph in Q* which represents 
X in the sense of Lemma 2(b). 

Example 3. The graph G in Figure 1 \s Gx, where x G S'(dyadic) is found from Example 2 
by the calculation 

X = (t)G{Ai)(pG{A^)<pG{AQ)(t)G{A7) 

= 0000000001010101 0011001101110111 0000111101011111 0011111101111111 



III Encoding Method 

For each G G Q*, we shall define in this section a binary codeword cr{G) from which G can 
be recovered. Given x G S'(dyadic), we can then losslessly encode x into the binary codeword 
a{Gx). 

We need the following notation. If G is a graph in Q*, and F is a nonterminal vertex of G, 
then the notation 

V ^ Vo, Vi 

means that Vq is the vertex of G to which edge from V leads, and Vi is the vertex of G to 
which edge 1 from V leads. 

Fix G e Q*, and let j be the number of vertices of G. We define a canonical ordering of 
the vertices of G. Let Ai,A2, . . . ,Aj be the enumeration of the vertices of G which is uniquely 
determined by the two properties 

Property (i): Ai = V 

Property(ii): liqi < q2 < ••• < (7^-2 are the integers in {1, 2, ... ,_7} such that ^q^, A^j, 
are the nonterminal vertices of G, and if we write 

Aq-^ ^ A-ri , 
Aq^ ^ ■Ar2 ) 

A-qj_2 ~^ ■Ar._^,Asj_2 
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then, if we list the distinct entries of the sequence 



in order of their first left-to-right appearances in this sequence, we get the list A2, A^, . . . , Aj. 

Example 4- The canonical ordering of the vertices of the graph G in Figure 1 is given in 
Figure 2. We can determine this ordering by generating the following relations one by one: 



A, - 


■> A2,A3 


A2 - 


■> ^4,^5 


As - 


■> Ae,A7 


A4 - 


■> ^8,^9 


A5 - 


■> ^10,^11 


Ae - 


■> ^12,^13 


A7 - 


■> ^14,^15 


Ag - 


■> ^8,^16 


Aio - 


■> ^8,^16 


An - 


■> ^9,^16 


A12 - 


■> ^8,^16 


An - 


■> ^9,^16 


Au - 


Aio,Aie 


Ai5 - 


■> ^11,^16 



(3.1) 

Notice that in (3.1), vertices A^ and Aiq are missing from the left hand sides. This means that 
^8 and Aie are the terminal vertices of the graph in Figure 2. One of these vertices is equal to 
Tq and the other is equal to Tq. We cannot determine which is the case from (3.1) alone. We 
would need an extra bit of information to determine which of the two possibilities 

A8 = T^ Ai6 = r^ 

A8 = T^ Aie = T^ 

holds. 

Let G G G*, let k be the positive integer such that L(Tq) = L(Tq) = A; + 1, and let 
^1, • • • , be the canonical ordering of the vertices of G. We will generate strings 81,82, ■■■ , 8k-\-i 
in which 

• 81 = Ai, and each entry of each 8i is a member of the set of symbols 

{Al:m = l,2,...,j, q=l,2,...} 

• The strings 81,82,. , 8k+i, taken together, allow one to build the graph G (except for 
the determination of which of the two terminal vertices equals Tq, and which equals Tq, 
which takes one more bit of information, as discussed above). 
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• Each Si {i > 2) is generated recursively from Si-i and certain side information, and the 
side information from each recursive step is what is encoded to form the overall codeword 
(7(G). Prom a{G), the decoder can then recursively generate the {Si}, from which G is 
obtained. 

Fix i, where 2 < i < k + 1. The following procedure describes how Si is recursively generated 
from Si-i: 

Step(i): Write down the string U consisting of the first appearances (from left to right) of each 
distinct symbol appearing in Si-i. 

Step(ii): For each entry of U of form A^, where q> I, write below that entry the entry A^^. 

Step(iii): For each entry of U of form A^, write down below that entry the two entries 
Af^^ , A'^^^ , where A^^ , A^^ are the respective vertices to which edges and 1 from A^n 
lead, and go q\ are the positive integers 



Step(iv): Concatenate together the sequence of entries written below the entries of U in Steps 
(ii) and (iii). The resulting sequence is Si. 

Exam-pie 5. For the graph G in Figure 2, the strings 5i, 5*2, . . . , 5*7 are as follows: 



Let G G Q* , let k be the positive integer such that L{Tq) = /c + 1, and let Ai,A2, . . . ,Aj 
be the canonical ordering of the vertices of G. One easily determines from ^i, 5*2, ... , Sk+i the 
level of each vertex Ai,A2,...,Aj. For each Ai, find the unique Sm such that Ai is an entry of 
Sm- Then, L{Ai) = m. To illustrate, from Si,S2, . . . , in Example 5, we determine that 



qo = L{Amo) - L{Am) 



Si 

S2 

S3 
S4 
S5 
Se 
S7 



{A2,A^) 
{A4,Ar„Ae,A7) 

{Aj, AlAf^, Al^,Ai2, Ai3, Au, A15) 

{Ag, Ag, AiQ, Aii,Ag, Afg, Ag, A^Q, Aio, Af^, An, Ai^) 

{Ag, Ag, Ag, AIq, Ag, Afg, AIq) 

{As,As,AiG,Aie) 



L{Ai) = l L{A2) = 2 L{A3) = 2 L(^4) = 3 

L(A) = 3 L{Aq) = 3 L{Ar) = 3 MAs) = 7 

L{Ag) = 6 L{Aio) = 5 L{An) = 5 L(Ai2) = 4 

L(^i3) = 4 L(^i4) = 4 L(^i5)=4 L(^i6) = 7 



Referring to Figure 2, we see that this assignment is correct. 
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One also easily determines from ^i, . . . , Sk+i where each edge of G begins and ends. For 
each nonterminal vertex Ai, find the unique m < k + 1 such that Ai is an entry of Sm, and then 
look below in 5^+1 to find the corresponding two consecutive entries Af°,Af^ — vertices Ai^ and 
Aij^ are then the respective vertices at which edges and 1 from Ai terminate. To illustrate, 
from 5i, ^2, . . . , 5*7 in Example 5, we get the edge description given in (3.1), which we see is 
correct by referring to Figure 2. 

For a graph G e Q* such that L{Tq) = L{Tq) = A; + 1, we suppose that the strings 
5*1, 5*2, ... , Sk+i have been generated. We now describe how these strings are encoded for trans- 
mission to the decoder. The decoder already knows that Si = Ai. In addition to this, the 
decoder needs to know: 

(a) How to obtain Si from 5^-1, for each i = 2, . . . , fc + 1. This information is transmitted to 

the decoder using Mi codebits. In the sequel, we shall explain what these Mi codebits 
consist of. 

(b) For the two symbols Aj^ and Aj^ comprising the entries of Sk+ii the decoder needs to know 

which of these symbols equals Tq. This information is transmitted to the decoder using 
one codebit. 

From the above description, we sec that a total of (M2-I-. . .-|-Mfc_|_i)-|-1 codebits is transmitted 
to the decoder by the encoder. We need to further explicate Step (a) above, so that it is 
understood what Mi is. To do this, we need a number of definitions. 

Definition 1. If u = (ui, 1^2, . . . , txj) is any nonempty sequence of finite length over any 
alphabet A, we define 

rr/ N A >A , n{Uj) 

H{u) = Z^-log2^j^, 

where, for each a G A, n{d) is the number of 1 < _7 < J for which Uj = a. If u is an empty 
sequence, we define H{u) = 0. The quantity H{u) is important for the following reason: If the 
set {ui,U2, ... ,uj} is known, and if the frequencies with which the symbols in this set appear in 
u are known, the sequence u can be losslessly encoded using \H{u)] codebits. This is because 
there are no more than 2^^'"^ sequences having the known symbol frequencies. 

Definition 2. If u is a sequence of finite length, \u\ denotes the length of u. 

Definition 3. Let u be any nonempty sequence of finite length over any alphabet. We define 
u to be the (possibly empty) sequence obtained from u by striking out each term of u which is 
making its first left-to-right appearance in u. For example, if 

u = {a,a,b,a,b,c,b,b,c,a), (3-2) 

we strike out the first, third, and sixth terms, obtaining 

u = (a, a, b, b, b, c, a) 

It could be that u is empty. In this case, we define u to be the empty sequence. 

Definition 4- If fx is a sequence of finite length such that H{ii) > 0, we define h{u) = 
\u\ + H{u). If u is a sequence of finite length such that H{u) = 0, we define h{u) = 0. Here is 
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why the quantity h{u) is important: If the frequencies with which the symbols appearing in u 
are known, and if the hst of these symbols in order of first left-to-right appearance in u is known, 
then the sequence u can be losslessly encoded using [/i('u)] codebits. To see this, one can encode 
u using \H(u)~\ codebits. Then, one can obtain u from u with an additional \u\ codebits (these 
additional codebits tell the decoder the positions in u where the first left-to-right appearances 
of the symbols in u occur). This gives us a total of \u\ + \H{u)] = \h{u)~\ codebits. (We have 
assumed H{u) = 0. The reader can treat the case H{u) = separately.) For example, if u is 
the sequence in (3.2), the additional \u\ codebits are (1, 0, 1, 0, 0, 1, 0, 0, 0, 0), the ones indicating 
first appearances of a, b,c m u in positions 1, 3, 6, respectively. 

Definition 5. For each 2 < i < k + 1, we let Si he the subsequence of Si that arises from 
substituting for the distinct entries of Si-i of form A^- (Recall that each such entry of Si-i 
generates two entries of Si.) 

Definition 6. An entry of Si of form ^4^, where A'^^^ appears in Si^i, shall be called a Type 
I entry of Si. We let it] denote the subsequence of Si consisting of all the Type I entries of Si. 

Definition 7. An entry of Si of form A^, where the symbol A^ does not appear in Si-\, 
shall be called a Type II entry of Si. We let vr? denote the subsequence of Si consisting of all 
the Type II entries of Si. Suppose that there are r distinct entries of vr?, and that A^ is the 
vertex of highest index m that has appeared in the sequences ^i, . . . , Si-i. Then, if we list 
the distinct entries of tt? in order of their first left-to-right appearances in tt^, this list will take 
the form 

All A<12 Aqr o\ 

• • • ' -^m+r V"^-"J/ 

Definition 8. We let Qi be the nonnegative integer consisting of the sum of all the powers q 
as A'i^ ranges through all of the distinct terms of vr^. (In other words, referring to (3.3), Qi is 
equal to qi + q2 + ...+ qr-) 

With the above definitions, we can now stipulate that 

Mi = \Si\ + \Si\ + Qi+ \H{t:1)-\ + \H{t:})-\, (3.4) 
Here is how the different terms in Mj arise: 

(a.l) Encoder transmits to decoder IS'jl codebits to let the decoder know the frequency with 
which each distinct element of Si appears. 

(a.2) Encoder transmits to decoder \Si\ codebits so that the decoder will know which entries of 
Si are of Type I and which entries are of Type II. 

(a.3) Encoder transmits to decoder Qi codebits so that the decoder will know the powers q 
appearing in the Type II entries A%^ of Si. 

(a.4) The encoder transmits to the decoder {Hij:})] codebits, which tell the decoder what -k} 
is. 

(a.5) The encoder transmits to the decoder [/t(7r?)] codebits, which tell the decoder what tt? 
is. 
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Definition. We let cr{G) be the binary codeword of length (M2 + . . . + M^+i) + 1 obtained 
by concatenating together the codebits from Steps (a.l)-(a.5), (b) above. 

Example 6. We explain how the decoder can obtain from S'4 in Example 5. Initially, the 
decoder will know that ^5 takes the form 

= (^8' ^9' ^10) ^11) 'S's), 

where the entries of have to be filled in. The decoder knows that the length of is 12. The 
decoder looks at the first 12 codebits that are currently in its codebit buffer, to determine the 
frequencies of the distinct entries of S5. In this case, these 12 codebits are 

0,1,0,1,0,1,0,1,0,0,0,1 

which tell the decoder that Ag appears twice in S5, Ag appears twice in 5*5, Aio appears twice 
in ^5, ^11 appears twice in S5, and an element of form AIq, with q unknown, appears four times 
in S5. The decoder now knows that TTg is of length one and consists of one appearance of each 
of the symbols ^g, ^10, ^11, and that vrl is of length four and consists of four appearances 
of the symbol AIq. The next jS'sj = 8 codebits in the decoder's buffer tell the decoder which 
entries of S5 are of Type I and which are of Type II. In this case, these codebits are 

(0,1,0,1,0,1,0,1), 

which tell the decoder that the entries of <S'5 alternate between Type I entries and Type II entries, 
starting with a Type I entry. The decoder now needs to determine the power q in the symbol 
AIq. In this case, q = S, and the decoder will know this because the codebits 

(0,0,1) 

will appear at the start of the decoder's codebit buffer at this point. The next \H{Ag, Aq, Aiq,Aii)~\ 
8 codebits tell the decoder that 

ttI = {AlAlAio,Au) 

The decoder already knows that 

■""5 = (^16' ^16' ^16' ^le)' 

so that, putting TTg and tt^ together, the decoder has determined that 

S5 = (^8 5 ^16 ) ^9 > ^16 ' ^10) ^16' ^11 > ^16 ) 



IV Performance Bound 

Let G G Q*, and let L{Tq) = fe + l. The binary codeword 17(G) results by encoding the sequences 
82,83,..., 8k+i, plus the transmission of an extra codebit to signal the decoder which of the 
two terminal vertices of G is equal to Tq. In this section, we want to upper bound the codeword 
length |c7(G)|, in order to see how good the encoder is. 
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From the previous section, it can be seen that 

< 4[|5i| + + . . . + + J2[\H{7rl)] + [i^lTrf)]] (4.5) 

i=2 

The only tricky part in obtaining this bound is the observation that 

fc+i 

EQ^<l^2| + |S3| + ... + |Sfc+i| 

i=2 

To see this, notice that if a Type II symbol appears in a sequence Si, and q > 1, then the 

q — 1 symbols Af^^ , Af^"^ , . . . , A^n appear in subsequent sequences S'j+i, Summing the 

powers q for all such symbols A^, one must obtain a quantity Q2 + ■ ■ ■ + Qk+i upper bounded 
by |S'2| + . . . + \Sk+i\. 

Let X be the binary string of length 2^ represented by G (i.e., ^g(^g) = Fix i satisfying 
2 < i < k + 1. Prom left to right, partition x into disjoint substrings of length 2*^""*+^, and let 
ui,U2, - ■ ■ , um be the list of distinct substrings in this partition, listed in order of first left-to- 
right appearance in the partition. For each in this list, let Um{L) denote the prefix of of 
length 2^^"*+^, and let Um{R) denote the suffix of of length 2'^"*+^. (In other words, when 
we bisect the string Um, we obtain Um{L) on the left, and Um{R) on the right.) Replace each 
Ujn in the sequence {ui, . . . , um) for which Um{L) 7^ Um{R) by the pair of strings Um{L),UmiR)', 
otherwise, if Um{L) = Um{R), replace Um by Um{L). These replacements yield a new sequence 
Vi whose entries are substrings of x of length 2*^"*"'"^. The following properties can be proved 
(sec [3]). 

Property 1: The sequence Si has the same length as the sequence Vi. 
Property 2: Writing 

Si = (gi,g2,---,gM) 

Vi = {ri,r2,...,rM) 

the sets {qi,q2, ■ ■ ■ , <?m} and {ri, r2, . . . , tm} are of the same size, and there is a one-to-one 
mapping ai from the first set onto the second set in which 

Vi = iai{qi),ai{q2), . . .,ai{qM)) 

Property 3: There is a partition 11 of x, and disjoint subsequences s^, s^,. . . , s'^'^^ of 11 (some 
of which may be empty), such that 

s' = Vi, 2<i<k + l 

Definitions. We let A denote the family of all mappings A : {0, 1}+ (0, 1] such that for 
every sequence u G {0, 1}"*", and every partition {ui,U2, ■ ■ ■ , Ur) of u into nonempty substrings 
of u, 

X{u) < A(ui)A(«2) ■ • ■ Kur) (4.6) 
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If A G A, we define 

sup •^(^) 



Lemma 3 Let X be a function in A for which |A| < oo. Let G E Q* , let x be the binary string 
of length ^ represented by G, and let Si, S2, ■ ■ ■ , Sk be the strings defined for G according to 
Section IL Then, 

fc+i /fc+i \ 

Y.[H{nl) + H{nf)] < ^ \S,\ log, |A| - log^ X{x) (4.7) 

i=2 \i=2 / 

Proof. The sequences tt} and 7r| are disjoint subsequences of Si. Applying Property 2, we 
have 

H{Trl) + H{7rf)<H{Si) = H{vi) 

The entries of Vi = {wi, . . . , wt) are substrings of x of length 2*^"'+-'^. Let S < |A| be the positive 
constant such that 

My) = A(y)/S, ye {0,1)2'=-^+^ 
defines a probability distribution on {0, l}^'^ Then, 

H{Vi) < -\0g2 ^{Wi) ~-\og2^x{w2) - ...-\0g2 Ii{wt) 

T 

< IS'illogs |A| - ^log2 A(wt) 

t=i 

Summing the preceding inequality over i in the range 2 < i < k + 1, and using Property 3 
together with the property (4.6) of A, we obtain (4.7). 

Lemma 4 There is a sequence of positive numbers {e^ : k = 1,2, . . .} converging to zero such 
that the following is true. For any k = 1,2, . . . and any G E Q* representing a binary string of 
length 2^ , if we let S\, S2, ■ ■ ■ , •S'^+i be the strings defined from G in Section II, 

\Si\ + 1^21 + ... + \Sk+i\ < ^'^'^^^'"^ (4.8) 

Sketch of Proof. Suppose G is any graph in Q* representing a binary string x of length 2*^. 
Let S{x) be the set of all binary strings which lie in the partitions of x into substrings of length 
1, 2, 2^, . . . , 2*^. Define the graph G' to be the graph in which: 

• The set of vertices of G' is S{x). The set of terminal vertices of G' is {0, 1}. 

• For each nonterminal vertex u of G', there are two edges emanating from u, one of which, 
labelled edge 0, terminates at the left half of u, and the other of which, labelled edge 1, 
terminates at the right half of u. 
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The graph G' is isomorphic to the graph termed by Liaw and Lin [4] the quasi-reduced ordered 
binary decision diagram corresponding to the ROBDD G. Let V{G) be the set of vertices of G 
and let V{G') be the set of vertices of G' . It is proved in the paper [4] that there exists a sequence 
of positive constants {efe} tending to zero such that for any k and any G E Q* representing a 
binary string of length 2*^, 

\V{G)\ < \V{G')\ < ^ ^^^^^^ (4.9) 

For the strings 5*1, 5*2, ... , Sfe+i defined for this same G as in Section II, it can be shown (we 
omit the proof here) that 

\Si\ + 1^21 + ... + \Sk+i\ < \V{G')\ + \V{G)\ (4.10) 

Combining (4.9) and (4.10), we obtain (4.8). 
Here is our main result. 



Theorem 1 Consider an arbitrary binary s-state information source. For each binary string 
X of finite length, let ii{x) denote the probability assigned to x by the given source. Then, for 
n = 2,4,8,16,..., 



n 



log2n 



(16 + 41og2S + o(l)) 



max{x e {0, 1}" n S'(dyadic) : |<t(x)| + loga n{x)} < 

Proof. Fix a A G A such that 

• iA| < s. 

• fJ-iy) < My) every binary string y. 

Fix n e {1,2,4,8,...} and x G {0,1}" fl S'(dyadic). Let G e Q* he the graph G = G^. Let 
k = log2 n, and let Si, 5*2, ... , 5*^+1 be the strings constructed from G according to Section II. 
Applying Lemmas 3 and 4 to (4.5), 



\a{G)\ < 4 
which gives us our result. 



+ 



2'=+i(2 + 6fc) 



log2 S - log2 l^ix) 
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Figure 1: A ROBDD G from Bryant [1] (left edges labelled 0, right edges labelled 1) 
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